Approximate String Matching for Geographic Names and Personal Names
نویسندگان
چکیده
The problem of matching strings allowing errors has recently gained importance, considering the increasing volume of online textual data. In geotechnologies, approximate string matching algorithms find many applications, such as gazetteers, address matching, and geographic information retrieval. This paper presents a novel method for approximate string matching, developed for the recognition of geographic and personal names. The method deals with abbreviations, name inversions, stopwords, and omission of parts. Three similarity measures and a method to match individual words considering accent marks and other multilingual aspects were developed. Test results show high precision-recall rates and good overall matching efficiency.
منابع مشابه
Using Soundex Codes For Indexing Names In ASR Documents
In this paper we highlight the problems that arise due to variations of spellings of names that occur in text, as a result of which links between two pieces of text where the same name is spelt differently may be missed. The problem is particularly pronounced in the case of ASR text. We propose the use of approximate string matching techniques to normalize names in order to overcome the problem...
متن کاملName-Ethnicity Classification and Ethnicity-Sensitive Name Matching
Personal names are important and common information in many data sources, ranging from social networks and news articles to patient records and scientific documents. They are often used as queries for retrieving records and also as key information for linking documents from multiple sources. Matching personal names can be challenging due to variations in spelling and various formatting of names...
متن کاملMatchsimile: a Flexible Approximate Matching Tool for Searching Proper Name
We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names foun...
متن کاملAn approximate matching method for clinical drug names.
OBJECTIVE To develop an approximate matching method for finding the closest drug names within existing RxNorm content for drug name variants found in local drug formularies. METHODS We used a drug-centric algorithm to determine the closest strings between the RxNorm data set and local variants which failed the exact and normalized string matching searches. Aggressive measures such as token sp...
متن کاملSoundex Algorithm for Indian Language Based on Phonetic Matching
In a system with a large database, there always has been a problem that names may not be spelled well or might not be spelled in a way that one expected. So, data in the database gets degraded. In this case it is required to search the duplicates and merge them in the single entity. In doing so, one problem is that the way in which the strings would be compared. In such cases rather than lookin...
متن کامل